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Abstract 

The extension of the Dezert-Smarandache theory (DSmT) for the multi-class framework has a feasible computational 
complexity for various applications when the number of classes is limited or reduced typically two classes. In contrast, when 
the number of classes is large, the DSmT generates a high computational complexity. This paper proposes to investigate the 
effective use of the DSmT for multi-class classification in conjunction with the Support Vector Machines using the One- 
Against-All (OAA) implementation, which allows offering two advantages: firstly, it allows modeling the partial ignorance 
by including the complementary classes in the set of focal elements during the combination process and, secondly, it allows 
reducing drastically the number of focal elements using a supervised model by introducing exclusive constraints when classes 
are naturally and mutually exclusive. To illustrate the effective use of the DSmT for multi-class classification, two SVM- 
OAA implementations are combined according three steps: transformation of the SVM classifier outputs into posterior 
probabilities using a sigmoid technique of Platt, estimation of masses directly through the proposed model and combination 
of masses through the Proportional Conflict Redistribution (PCR6). To prove the effective use of the proposed framework, a 
case study is conducted on the handwritten digit recognition. Experimental results show that it is possible to reduce 
efficiently both the number of focal elements and the classification error rate. 

Keywords: Handwriting digit recognition; Support Vector Machines; Dezert-Smarandache theory; Belief assignments; 
Conflict management 

1. Introduction 

Nowadays a large number of classifiers and methods of generating features is developed in various application areas of 
pattern recognition [Duda, 2001], [Cher, 2007]. Nevertheless, it failed to underline the incontestable superiority of a method 
over another in both steps of generating features and classification. Rather than trying to optimize a single classifier by 
choosing the best features for a given problem, researchers found more interesting to combine the recognition methods 
[Rahm, 2003], [Cher, 2007]. Indeed, the combination of classifiers allows exploiting the redundant and complementary 
nature of the responses issued from different classifiers. 

Researchers have proposed various approaches for combining classifiers increasingly numerous and varied, which led the 
development of several schemes in order to treat data in different ways [Rahm, 2003], [Cher, 2007]. Generally, three 
approaches for combining classifiers can be considered: parallel approach, sequential approach and hybrid approach [Cher, 
2007]. Furthermore, these ones can be performed at a class level, at a rank level, or at a measure level [Xu, 1992], [Jain, 
2000], ([Ruta, 2000], [Abba, 2012c], 
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In many applications, various constraints do not allow an efficient joint use of classifiers and feature generation methods 
leading to an inaccurate performance. Thus, an appropriate operating method using mathematical approaches is needed, 
which takes into account two notions: uncertainty and imprecision of the responses of classifiers. In general, the most 
theoretical advances which have been devoted to the theory of probabilities are able to represent the uncertain knowledge but 
are unable to model easily the information which is imprecise, incomplete, or not totally reliable. Moreover, they often lead 
to confuse both concepts of uncertainty and imprecision with the probability measure. Therefore, new original theories 
dealing with uncertainty and imprecise information have been introduced, such as the fuzzy set theory [Zade, 1968], evidence 
theory [Dem, 1967], [Shaf, 1976], possibility theory [Dubo, 1988] and, more recently, the theory of plausible and paradoxical 
reasoning [Smar, 2004], [Smar, 2006], [Smar, 2009]. 

The evidence theory initiated by Dempster and Shafer termed as Dempster-Shafer theory (DST) [Demp, 1967] [Shaf, 1976] 
is generally recognized as a convenient and flexible alternative to the bayesian theory of subjective probability [Shaf, 1990]. 
The DST is a powerful theoretical tool which has been applied in many kinds of applications [Smet, 1999] for the 
representation of incomplete knowledge, belief updating and for the combination of evidence [Prov, 1992], [Dubo, 1992] 
through the Dempster-Shafer’s combination rule. Indeed, it offers a simple and direct representation of ignorance and has a 
low computational complexity [Rusp, 1992] for most practical applications. 

Nevertheless, this theory presents some weaknesses and limitations mainly when the combined evidence sources become 
very conflicting. Furthermore, the Shafer’s model itself does not allow necessary holding in some fusion problems involving 
the existence of the paradoxical information. To overcome these limitations, a recent theory of plausible and paradoxical 
reasoning, known as Dezert-Smarandache theory (DSmT) in the literature, was elaborate by Jean Dezert and Florentin 
Smarandache for dealing with imprecise, uncertain and paradoxical sources of information. Thus, the main objective of the 
DSmT was to introduce combination rules that would allow to correctly combining evidences issued from different 
information sources, even in presence of conflicts between sources or in presence of constraints corresponding to an 
appropriate model (free or hybrid DSm models [Smar, 2004]). The DSmT has proved its efficiency in many current pattern 
recognition application areas such as remote sensing [Corg, 2003], [Maup, 2004], [Elha, 2011], [Zhun, 2012], identification 
and tracking [Pann, 2008], [Pann, 2009], [Kech, 2009], [Sun, 2010], [Deze, 2010], [Pann, 2011], biometrics [Sing, 2008], 
[Vats, 2009a], [Vats, 2009b], [Vats, 2010], computer vision [Garc, 2008], [Khod, 2010], [Deze, 2011], robotics [Huan, 2006], 
[Li, 2006a], [Li, 2006b], [Li, 2007], [Li, 2008], [Huan, 2009] and more recently handwritten recognition applications [Abba, 
2012a], [Abba, 2012b], [Abba, 2012c] as well as many others [Smar, 2004], [Smar, 2006], [Smar, 2009]. 

The use of the DSmT for multi-class classification has a feasible computational complexity for various applications when the 
number of classes is limited or reduced typically two classes [Abba, 2012a]. In contrast, when the number of classes is large, 
the DSmT generates a high computational complexity closely related to the number of elements to be processed. Indeed, an 
analytical expression defined by Tombak et al. [Tomb, 2001] shows that the number of elements to be processed follows the 
sequence of Dedekind’s numbers [Dede, 1897], [Comt, 1974]: 1,2,5,19,167,7580,7828353,... For instance, if the number of 
classes belonging to discernment space is 8, then the number of elements to be deal in DSmT framework is « 5.6 x 1 0 22 . 
Hence, it is not easy to consider the set of all subsets of the original classes (but under the union and the intersection 
operators) since it becomes untractable for more than 6 elements in the discernment space [Deze, 2004a]. Thus, Dezert and 
Smarandache [Deze, 2004b] proposed a first work for ordering all elements generated using the free DSm model for matrix 
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calculus such as made in DST framework [Kenn, 1992], [Smet, 2002]. However, this proposition has limitations since in 
practical applications it is more appropriate to only manipulate the focal elements [Deno, 2001], [Djik, 2004], [Mart, 2006], 
[Abba, 2012c], 

Hence, few works have already been focused on the computational complexity of the combination algorithms formulated in 
DSmT framework. Djiknavorian and Grenier [Djik, 2004] showed that there’s a way to avoid the high level of complexity of 
DSm hybrid (DSmH) combination algorithm by designing a such code that can perform a complete DSmH combination in 
very short period of time. However, even if they have obtained an optimal process of evaluating DSmH algorithm, first some 
parts of their code are really not optimized and second it has been developed only for a dynamic fusion. Martin [Mart, 2009] 
further proposed a practical codification of the focal elements which gives only one integer number to each part of the Venn 
diagram representing the discernment space. Contrary to the Smarandache’s codification [Deze, 2004a] used in [Deze, 2004c] 
and the proposed codes in [Djik, 2004], author thinks that the constraints given by the application must be integrated directly 
in the codification of the focal elements for getting a reduced discernment space. Therefore, this codification can drastically 
reduce the number of possible focal elements and so the complexity of the DST as well as the DSmT frameworks. A 
disadvantage of this codification is that the complexity increases drastically with the number of combined sources especially 
when dealing with a problem in the multi-class framework. To address this issue, Li et al. [Li, 2011] proposed a criterion 
called evidence supporting measure of similarity (ESMS), which consists in selecting, among all sources available, only a 
subset of sources of evidence in order to reduce the complexity of the combination process. However, this criterion has been 
justified for only a two-class problem. Nowadays, the complexity of reducing both the number of combined sources and the 
size of the discernment space are research challenges that still need to be addressed. 



In many pattern recognition applications, the classes belonging to the discernment space are naturally and then mutually 
exclusive such as in biometrics [Sing, 2008], [Vats, 2009§], [Vats, 2009b], [Vats, 2010] and handwritten recognition 
applications [Abba, 2012a], [Abba, 2012b], [Abba, 2012c]. Hence, several classification methods have been proposed as 
template matching techniques [Deng, 1999], [Fang, 2003], [Guo, 2001], minimum distance classifiers [Fang, 2001], [Sabo, 
1997], support vector machine (SVM) [Just, 2005], hidden Markov Models (HMMs) [Just, 2001], [Just, 2005], [Coet, 2004], 
neural networks [Kaew, 1999], [Quek, 2002]. In various pattern recognition applications, the SVMs have proved their 
performance from the mid-1990s comparatively to other classifiers [Cher, 2007]. The SVM is based on an optimization 
approach in order to separate two classes by an hyperplane. In the context of multi -class classification, this optimization 
approach is possible [Weston, 1998] but requiring a very costly duration. Hence, two preferable methods of multi-class 
implementation of SVMs have been proposed for combining several binary SVMs, , which are One Against All (OAA) and 
One Against One (OAO), respectively [West, 1998], [Guer, 1999], [Hsu, 2001], The former is the most commonly used 
implementation in the context of multi-class classification using binary SVMs, which constructs n SVMs to solve a n -class 



problem [Bott, 1994], Each SVM is designed to separate a simple class from all the others, i.e., from the corresponding 



complementary class 6 t = . In contrast, the OAO implementation is designed to separate two simple classes 8, and 

0<j<n — 1 
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6j ( i it. j ), which requires «x(n-l)/2 SVMs. Hence, various decision functions can be used such as the Decision Directed 

Acyclic Graph (DDAG) [Huan, 2002] since it has the advantage to eliminate all possible unclassifiable data. 

Generally, the combination of binary classifiers is performed through very simple approaches such as voting rule or a 
maximization of decision function coming from the classifiers. In this context, many combination operators can be used, 
especially in the DST framework [Martin, 2007]. Still in the same vein, some works have been tried out the combination of 
binary classifier originally from SVM in the DST framework [Aregui, 2007], [Quost, 2007a]. For instance, the pairwise 
approach has been revisited by Quest et al. [Quost, 2005], [Quost, 2006], [Quost, 2007a], [Quost, 2007b] in the framework of 
the DST of belief functions for solving a multi-class problem. In [Hu, 2005], the combination method based on DST has been 
used by Hu et al. for combining multiple multi-class probability SVM classifiers in order to deal with distributed multi- 
source multi-class problem [Hu, 2005]. Martin and Quidu proposed an original approach based on DST [Martin, 2008] for 
combining binary SVM classifiers using OAO or OAA strategies, which provides a decision support helping experts for 
seabed characterization from sonar images. Burger et al. [Burger, 2006] proposed to apply a belief-based method for SVM 
fusion to hand shape recognition. Optimizing the fusion of the sub-classifications and dealing with undetermined cases due to 
uncertainty and doubt have been investigated by other works [Burger, 2008], through a simple method, which combines the 
fusion methods of belief theories with SVMs. Recently, one regression based approach [Laanay, 2010] has been proposed to 
predict membership or belief functions, which are able to model correctly uncertainty and imprecision of data. 

In this work, we propose to investigate the effective use of the DSmT for multi-class classification in conjunction with the 
SVM-OAA implementation, which allows offering two advantages: firstly, it allows modeling the partial ignorance by 
including the complementary classes in the set of focal elements, and then in the combination process, contrary to the OAO 
implementation which takes into account only the singletons, and secondly, it allows reducing drastically the number of focal 
elements from Dedekind (n) to 2 xn. The reduction is performed through a supervised model using exclusive constraints. 
Combining the outputs of SVMs within DSmT framework requires that the outputs of SVMs must be transformed into 
membership degree. Hence, several methods of estimating of mass functions are proposed in both DST and DSmT 
frameworks, these ones can be directly explicit through special functions or indirectly explicit through transfer models 
[Demp, 1967], [Deno, 1997], [Smet, 1994], [Dubo, 1994], [Appr, 1999]. In our case, we propose a direct estimation method 
based on a sigmoid transformation of Platt [Plat, 1999]. This allows us to satisfy the OAA implementation constraint. 

The paper is organized as follows. Section 2 reviews the Proportional Conflict Redistribution (PCR6) rule based on DSmT. 
Section 3 describes the combination methodology for multi-class classification using the SVM-OAA implementation. 
Experiments conducted on the dataset of the isolated handwritten digits are presented in section 4. The last section gives a 
summary of the proposed combination framework and looks to the future research direction. 

2. Review of PCR6 combination rule 

In pattern recognition, the multi-class classification problem is generally formulated as a n -class problem where classes are 
associated to patterns classes, namely # 0 ,#i,..., and 9 n . Hence, the parallel combination of two classifiers, namely 
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information sources S i and S 2 , respectively, is performed through the PCR6 combination rule based on the DSmT. For n - 
class problem, a reference domain also called the discernment space should be defined for performing the combination, 
which is composed of a finite set of exhaustive and mutually exclusive hypotheses. 

In the context of the probabilistic theory, the discernment space, namely 0 , is composed of n elements as: 
® = {9 Q ,9 l ,...,9 n }, and a mapping function m e [o, l] is associated for each class, which defines the corresponding mass 

verifying iv(0) = 0 and ^ ^ /«(#,)= 1 . In Bayesian framework, combining two sources of information by means of the 

weighted mean and consensus based rules seems effective for non-conflicting responses [Bloc, 2003], [Fren, 1985], [Cook, 
1988], [Cook, 1991]. In the opposite case, an alternative approach has been developed in DSmT framework to deal with 
(highly) conflicting imprecise and uncertain sources of information [Smar, 2009]. Example of such approaches is PCR6 rule. 

The main concept of the DSmT is to distribute unitary mass of certainty over all the composite propositions built from 
elements of © with u (Union) and n (Intersection) operators instead of making this distribution over the elementary 

hypothesis only. Therefore, the hyper-powerset D & is defined as: 

1. &,9 0 ,9 x ,...,9 n eO®. 

2. If A,BgD @ , then dn5e D & and AvjBgD® . 

3. No other elements belong to D & , except those obtained by using rules 1 or 2. 



The DSmT uses generalized basic belief mass, also known as the generalized basic belief assignment (gbba) computed on 
hyper-powerset of 0 and defined by a map m(.) : D e — > [0, l] associated to a given source of evidence which can support 
paradoxical information, as follows: m(0) = 0 and ^ ^ ^ () m(A) = I . The combined masses m PCR 6 obtained from m } (.) and 

are defined as: 



, (.) by means of the PCR6 rule [Smar, 2006], [Smar, 2009] 



m PCR6 (-4 ) 



Where 



0 if 4 e ®, 

, 2 

(4 ) + X m '< (4 ) L k otherwise. 

k=\ 



(l) 



4 = 



y ”V t ( 0*4(1) 
Y ^ D@ 



( 2 ) 
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® = {® M ,0} is the set of all relatively and absolutely empty elements, <[) M is the set of all elements of D & which have 
been forced to be empty in the hybrid model M defined by the exhaustive and exclusive constraints, 0 is the empty set, the 
denominator m k (A i )+m a (\)Y a (p is different to zero, and where a k (l) counts from 1 to 2 avoiding k, i.e. : 







if k = 1, 
if k = 2. 



(3) 



Thus, the term m A (A i ) represents a conjunctive consensus, also called DSm Classic (DSmC) combination rule [Smar, 2006] 
[Smar, 2009], which is defined as: 




0 

^m,(x)m 2 (r) 

(x,YcD e ,XnY=A i ) 



if 4 = 0, 
otherwise. 



(4) 



3. Methodology 

The proposed combination methodology shown in Fig. 1 is composed of two individual systems using SVMs classifiers. 
Each one is trained using its own source of information providing two kinds of complementary features, which are combined 
through the PCR6 rule. In the following, we give a description of each module composed our system. 
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Figure 1. Structure of the combination scheme using SVM and DSmT 



3.1. Classification based on SVM 

The classification based on SVMs has been used widely in many pattern recognition applications as the handwritten digit 
recognition [Cher, 2007]. The SVM is a learning method introduced by Vapnik et al. [Vapn, 1995], which tries to find an 
optimal hyperplane for separating two classes. Its concept is based on the maximization of the distance of two points 
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belonging each one to a class. Therefore, the misclassification error of data both in the training set and test set is minimized. 
Basically, SVMs have been defined for separating linearly two classes. When data are non linearly separable, a kernel 
function K is used. Thus, all mathematical functions, which satisfy Mercer’s conditions, are eligible to be a SVM-kernel 
[Vapn, 1995]. Examples of such kernels are sigmoid kernel, polynomial kernel, and Radial Basis Function (RBF) kernel. 
Then, the decision function f : R p — » {- 1,4- 1} , is expressed in terms of kernel expansion as: 



Sv 

(*’** ) + b 
k = 1 



( 5 ) 



where a k are Lagrange multipliers, .SV is the number of support vectors x k which are training data, such that 0 <a k <C, 
C is a user-defined parameter that controls the tradeoff between the machine complexity and the number of nonseparable 
points [Fluan, 2002], the bias b is a scalar computed by using any support vector. 

Finally, for a two-class problem, test data are classified according to: 

[class (+\) if /(x)>-0 

re , s. (6) 

[c/oss [-1] otherwise 



The extension of the SVM for multi-class classification is performed according the One Against- All (OAA) [Cort, 1995]. Let 
a set of N training samples which are separable in n classes such that 

f x A- ) e x {±l}; k = 1 ,.. ,N; i = 1, ./? j. The principle consists to separate a class from other classes. Consequently, n 

SVMs are required for solving n class problem. 

3.2. Classification Based On DSmT 

The proposed classification based on DSmT is presented in Fig. 2, which is conducted into three steps: i) estimation of 
masses, ii) combination of masses through the PCR6 combination rule and iii) decision rule. 




Figure 2. DSmT -based parallel combination for multi-class classification 
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3.2.1. Estimation of Masses 

The difficulty of estimating masses is increased if one assigns weights to the composed classes [Lowr, 1991]. Therefore, 
transfer models of the mass function have been proposed whose the aim is to distribute the initial masses on the simple and 
compound classes associated to each source. Thus, the estimation of masses is performed into two steps: i) assignment of 
membership degrees for each simple class through a sigmoid transformation proposed by Platt [Plat, 1999], ii) estimation of 
masses of simple classes and their complementary classes using a supervised model, respectively. 



• Calibration of the SVM outputs: Although, standard SVM is very discriminative classifier, its output values are not 
calibrated for appropriately combining two sources of information. Hence, an interesting alternative is proposed in [Plat, 
1999] to transform the SVM outputs into posterior probabilities. Thus, given a training set of instance-label pairs 
\(x k ,y k \k =1,...,V}, where r l .eR p and y k e {- 1,+ 1} , the unthresholded output of an SVM is a distance measure 
between a test pattern and the decision boundary as given in (5). Furthermore, there is no clear relationship with the 
posterior class probability p(y = +l|x) that the pattern x belongs to the class y =+l. A possible estimation for this 

probability can be obtained [Plat, 1999] by modeling the distributions p(f\y = +\) and P(f\y = — l) of the SVM output 
/(x) using Gaussian distribution of equal variance and then compute the probability of the class given the output by 
using Bayes’ rule. This yields a sigmoid allowing to estimate probabilities: 



p(y = +l|x)= 7 — - — t 

l + exp(v4x f(x)+B) 

Parameters A and B are tuned by minimizing the negative log-likelihood of the training data: 



N 

- Xa Xo ^Qk )+(!— h ) 1 °g( 1 - Qk ) (8) 

k = 1 



where Q k = p{y k = l|x) and 



. _Ta - +1 
2 



denotes the probability target. 



• Supervised Model: 'Denoting »?](.) and mf .) the gbba provided by two distinct information sources .S', (First descriptor) 
and S 2 (Second descriptor), F is the set of focal elements for each source, such that F = fa o ,0 1 ,...,0 n _ 1 ,0 o ,0 l ,...,0 n _ 1 }, 
the classes 9 , are separable (One relatively to its complementary class 9 i ) using the SVM-OAA multi-class 
implementation corresponding to different singletons of the patterns assumed to be known. Therefore, each compound 
element A f g F has a mass m , equal to zero, on the other hand, the mass of the complementary element 6j = Ik is 

0< j<n—\ 

j*i 
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different from zero, which represents the mass of the partial ignorance. The same reasoning is applied to the classes issued 
from the second source S 2 and m 2 (.). Hence, both gbba «i](.) and m 2 (.) are given as follows: 



m h {Oi) 



Z b 






(9) 



Z4M 



,(^)= 



j = o 
./*» 



V<9« e F 



( 10 ) 



m b (A,.) = 0,VA i e<t> = D e \F 



( 11 ) 



where Z h = ^ ^ P h (if |x) represent the normalization factors introduced in the axiomatic approach in order to respect 

the mass definition, P h are the posterior probabilities issued from the first source (ft = l) and the second source {b = 2), 
respectively. They are given for a test pattern x as follows: 



pM\*)= 



1 + exp (A ib xf b (x) + B ib )' 



( 12 ) 



A ib and B jb are the parameters of the sigmoid function tuned by minimizing the negative log-likelihood during training for 
each class of patterns 9j , and f ih (x ) is the i -th output of binary SVM classifier SVMf issued from the source S b , such that 
i = 0,1, — 1 and b e {l, 2} . 

In summary, the masses of all elements A - e D @ allocated by each information source S b ( b = 1,2) are obtained according 
the following steps: 

1 . Define a frame of discernment © = {6\ ,d 2 ,...,9 n } 

2. Classify a pattern x through the SVM-OAA implementation. 

3. Transform each SVM output to the posteriori probability using Eq. (12). 

4. Compute the masses associated to each class and its complementary using Eq. (9) and Eq. (10), respectively. 

3.2.2. Combination of masses 

In order to manage the conflict generated from the two information sources and S 2 (i.e. both SVM classifications), the 
combined masses are computed as follows: 
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m c = m x © m 2 



(13) 



where © defines the PCR6 combination rule as given in (1). Hence, in the context of some application of pattern recognition 
area, such as handwritten digit recognition, we take as constraints the propositions ( d l r\ Oj = 0, V#,,# e 0 ), such that 

i * j , which allow separating between each two classes belonging to 0 . 

Therefore, the hyper power set D & is reduced to the set F as F = which defines a particular 

case of the Shafer’s model. Thus, the conflict K c (e [0, l]) measured between two sources is defined as: 

K c = ( 14 ) 

A t ,A,<EF 
A k nA/G O 



where ® = Z) 0 \F is the set of all relatively and absolutely empty elements, m ] (.) and m 2 {.) represent the corresponding 
generalized basic belief assignments provided by two information sources 5; and S 2 , respectively. 

3.2.3. Decision rule 

A membership decision of a pattern to one of the simple classes of 0 is performed using the statistical classification 
technique. First, the combined beliefs are converted into probability measure using a new probabilistic transformation, called 
Dezert-Smarandache probability (DSmP), that maps a belief measure to a subjective probability measure [Smar, 2009] 
defined as: 



DSmP e {9 i ) = mM) + {rn c {e i )+£) Y (15) 

Aj £p Zj nic } + sC m( a j) 

A k s2 0 
C m (Aj)> 2 A k cAj 

C „(4)= 1 



where * = \0,1 9}, s> 0 is a tuning parameter, M is the Shafer’s model for 0 , and C v/ (A k ) denotes the DSm cardinal 

of A k 



[Smar, 2004]. Therefore, the maximum likelihood (ML) test is used for decision making as follows: 



x e 0 l if DSmP e (0 I ) = max 



DSmP e (B j ),0< j 




(16) 



where x is the pattern test characterized by both descriptors, which are used during the feature generation step, and s is 
fixed to 0.001 in the decision measure given by (15). 
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4. Experimental results 

4.1. Database description and performance evaluation 

For evaluating the effective use of the DSmT for multi-class classification, we consider a case study conducted on the 
handwriting digit recognition application. For this, we select a well-known US Postal Service (USPS) database that contains 
normalized grey-level handwritten digit images of 10 numeral classes, extracted from US postal envelopes. All images are 
segmented and normalized to a size of 16x16 pixels. There are 7291 training data and 2007 test data where some of them 
are corrupted and difficult to classify correctly (Fig. 3). The partition of the databse for each class according tranining and 
testing is reported in table 1 . 

3 * A b V 

2 4 6 8 9 

^ 6 L- S’ 3 

3 5 6 8 9 

Figure 3. Some samples with their alleged classes from USPS database. 



Table 1. Partitioning of the USPS dataset 



Classes 


0 


1 


2 


3 


4 


5 


6 


7 


8 


9 


Training 


1194 


1005 


731 


658 


652 


556 


664 


645 


542 


644 


Testing 


359 


264 


198 


166 


200 


160 


170 


147 


166 


177 



For evaluating performances of the handwritten digit classification, a popular error is considered, which is the Error Rate per 
Class (ERC) and Mean Error Rate (MER) for all classes. Both errors are expressed in %. 

3.2. Pre-processing 

The acquired image of isolated digit should be processed to facilitate the feature generation. In our case, the pre-processing 
module includes a binarization step using the method of Otsu [Otsu, 1979], which eliminates the homogeneous background 
of the isolated digit and keeps the foreground information. Thus, we use the processed digit without unifying size image for 
recognition process. 

3.3. Feature Generation 

The objective of the feature generation step is to underline the relevant information that initially exists in the raw data. Thus, 
an appropriate choice of the descriptor improves significantly the accuracy of the recognition system. In this study, we use a 
collection of popular feature generation methods, which can be categorized into background features [Brit, 2004], [Cava, 
2006], foreground features [Brit, 2004], [Cava, 2006], geometric features [Cher, 2007], and uniform grid features [Fava, 
1996], [Abba, 2011]. 
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3.4. Validation of SVM Models 

The SVM model is produced for each class according the used descriptor. Hence, the training dataset is partitioned into two 
equal subsets of samples, which are used for training and validating each binary SVM, respectively. Thus, the validation 
phase allows finding the optimal hyperparameters for the ten SVM models. In our case, the RBF kernel is selected for the 
experiments. Furthermore, both the regularization and RBF kernel parameters (C,cr) of each SVM are tuned experimentally 
during the training phase in such way that the misclassification error of data in the training subset is zero and the validation 
test gives a minimal error during validation phase for each SVM separating between a simple class and its complementary 
class. 

Table 2 shows an example of the optimal parameters, which are obtained during both training and validation phases by using 
the UG-SVMs classifier. The parameters n and m define the number of the lines (vertical regions) and columns (horizontal 
regions) of the grid, respectively, which have been optimized during the validation phase for each SVM model. Therefore, 
these all parameters are used afterwards during the testing phase. ERCs and ERCc are the Error Rates per Class for simple 
and complementary classes, respectively. As we can see, the choice of the optimal size of the uniform grid and 
hyperparameters of each SVM should be tuned carefully in order to produce a reduced error. 



Table 2. Optimal parameters of the UG-SVMs classifier 



Parameters 








SVM Classifier 










0 


1 


2 


3 


4 


5 


6 


7 


8 


9 


n 


7 


2 


8 


5 


4 


7 


7 


8 


8 


7 


m 


5 


3 


3 


6 


12 


5 


8 


6 


6 


10 


a 


3.5 


1 


3.5 


4 


3 


3.5 


4 


3.5 


5 


4.5 


c 


5 


3 


4 


5 


4 


4 


2 


4 


3 


5 


ERCs (%) 


2.0 


1.0 


4.6 


5.7 


15.6 


10.0 


2.7 


5.5 


11.8 


4.0 


ERCc (%) 


0.6 


1.1 


0.4 


0.3 


0.1 


0.3 


0.1 


0.1 


0.3 


0.4 



3.5. Quantitative results and discussion 

The testing phase is performed using all samples from the test dataset. Hence, the performance of the handwritten digit 
recognition classification is evaluated on an appropriate choice of descriptors using the SVM classifiers and then we evaluate 
the combination of the SVMs classifiers within DSmT framework. 

3.5.1. Comparative analysis of features 

The choice of the complementary features is an important step to ensure efficiently the combination. Indeed, the DSmT -based 
combination allows offering an accurate performance when the selected features are complementary. Hence, we propose in 
this section the performance of features in order to select the best ones for combining through the DSmT. For this, we 
evaluate each SVM-OAA implementation using Foreground Features (FF), Background Features (BF), Geometric Features 
(GF), Uniform Grid Features (UGF), and the descriptors deduced from a concatenation between at least two simple 
descriptors such as (BF,FF), (BF,FF,GF) and (UGF,BF,FF,GF). Indeed, the experiments have shown that the appropriate 
choice of both descriptors and concatenation order to represent each digit class in the feature generation step provides an 
interesting error reduction. In table 3, FF and UGF -based descriptors using SVM classifiers are evaluated. When 
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concatenating background and foreground (BF,FF)-features, we observe a significant reduction of the MER. Indeed, an error 
rate reduction of 6.71% is obtained when concatenating BF and FF, respectively. Furthermore, an error rate reduction of 
1.5% is obtained when concatenating BF, FF and GF, respectively. This proves that BF, FF and GF are complementary and 
are more suitable for concatenation. In contrast, when concatenating UGF with BF, FF and GF, the MER is increased to 
2.73% comparatively to UGF. This proves that the concatenation does not always allow improving the performance of the 
classification. Thus, we expect that the UGF and (BF,FF,GF) descriptors are more suitable for combining through the DSmT. 

Table 3. Mean error rates of the SVM classifiers using different methods of feature generation 



Descriptor 


Error Rate (%) 


(a) FF 


18.87 


(b) (BF,FF) 


12.16 


(c) (BF,FF,GF) 


10.66 


(d) UGF 


6.98 


(e) (UGF,BF,FF,GF) 


9.71 



3.5.2. Performance evaluation of the proposed combination framework 

In these experiments, we evaluate a handwritten digit recognition classification based on a combination of SVM classifiers 
through DSmT. The proposed combination framework allows exploiting the redundant and complementary nature of the 
(BF,FF,GF) and UGF-based descriptors and manage the conflict provided from the outputs of SVM classifiers. 

Decision making will be only done on the simple classes belonging to the frame of discernment. Hence, we consider in both 
combination process and calculation of the decision measures the masses associated to all classes representing the partial 

ignorance <9, = 9j and 9, n 9 f such that i j . Thus, in order to appreciate the advantage of combining two sources of 

0< j<n - 1 

j*i 

information through the DSmT -based algorithm. Figure 4 shows values of the distribution of the conflict measured for each 
test sample between both SVM-OAA implementations using (BF,FF,GF) and UGF-based descriptors for the 10 digit classes 
{d i ,i = 0,1,. ..,9), respectively. Table 4 reports the minimal and maximal values of the conflict (K ci ,i = 0,1,. . .,9) generated 
through the supervised model, which represent the mass assigned to the empty set, after combination process. As we can see, 
the conflict is maximal for the digit 4 while it is minimal for the digit 9. 
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(c) Measured conflict for the digits belonging to 0 9 
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(e) Measured conflict for the digits belonging to 9 4 
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(f) Measured conflict for the digits belonging to 0$ 




(h) Measured conflict for the digits belonging to 9-j 




(j) Measured conflict for the digits belonging to 9g 
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Figure 4. Conflict between both SVMs classifiers using (BF,FF,GF) and UGF-based descriptors for the ten digit classes 
(dj,i = 0,1,. ..,9), respectively. 



Table 4. Ranges of conflict variations measured between both SVM-OAA implementations using (BF,FF,GF) and UGF- 



based descriptors 



Class 


Minimal conflict (10" 5 ) 


Maximal conflict (10 -2 ) 


0 


2.149309 


2.9933 


1 


6.999035 


2.9964 


2 


2.747717 


2.9992 


3 


2.936855 


2.9994 


4 


0.494599 


3.0000 


5 


1.868961 


2.9970 


6 


2.537015 


2.9887 


7 


2.826402 


2.9983 


8 


1.485899 


2.9910 


9 


0.276778 


2.9999 



For an objective evaluation. Table 5 shows ERC and MER produced from three SVM-OAA implementations using UGF, 
(BF,FF,GF), the descriptor resulting from a concatenation of both UGF and (BF,FF,GF) (i.e. combination at features level) 
and finally the PCR6 combination rule (i.e. combination at measure level) performed on (BF,FF,GF) and UGF based 
descriptors, respectively. 

Table 5. Error rates of the proposed framework with PCR6 combination 



rule using (BF,FF,GF) and UGF descriptors 





Descriptor 


Concatenation 


Combination rule 


ERC (%) 


(BF,FF,GF) 


UGF 


(UGF,BF,FF,GF) 


PCR6 


0 


6.69 


1.95 


9.75 


1.95 


1 


4.55 


3.79 


3.79 


3.03 


2 


12.63 


8.08 


3.54 


6.06 


3 


17.47 


10.84 


18.67 


10.84 


4 


20.00 


11.50 


19.50 


9.00 


5 


16.87 


10.00 


10.62 


7.50 


6 


2.94 


5.29 


4.71 


3.53 


7 


8.84 


8.16 


8.84 


4.76 


8 


12.05 


10.84 


10.24 


6.63 


9 


10.73 


6.21 


10.17 


5.65 


MER (%) 


10.66 


6.98 


9.71 


5.43 



Overall, the proposed framework using PCR6 combination rule is more suitable than individual SVM-OAA implementations 
since it provides a MER of 5.43% comparatively to the concatenation which provides a MER of 9.71%. Flowever, when 
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inspecting carefully each class, we can note that the PCR6 combination rule allows keeping or reducing in the most cases the 
ERC except for the samples belonging to classes 0 2 and 6 h .This bad performance is due to the wrong characterization of 
both UG and (BF,FF,GF)-based descriptors. In other words, the PCR6 combination is not reliable when the complementary 
information provided from both descriptors is wrongly preserved. 

Thus, PCR6 combination rule allows managing correctly the conflict generated from SVM-OAA implementations, even 
when they provide very small values of the conflict (see Table 4) mainly in the case of samples belonging to # 8 . Thus, the 
DSmT is more appropriate to solve the problem for handwritten digit recognition. Indeed, the PCR6 combination rule allows 
an efficient redistribution of the partial conflicting mass only to the elements involved in the partial conflict, contrary to the 
DST which redistributes the beliefs through a simple normalization by {V—K cw ,w = 0,\,- ..,9) in the combination process of 
masses. After redistribution, the combined mass is transformed into the DSm probability and the maximum likelihood (ML) 
test is used for decision making. Finally, the proposed algorithm in DSmT framework is the most stable across all 
experiments whereas recognition accuracies pertaining to both individual SVMs classifiers vary significantly. 

4. Conclusion and future work 

In this paper, we proposed an effective use of the DSmT for multi-class classification using conjointly the SVM-OAA 
implementation and a supervised model. Exclusive constraints are introduced through a direct estimation technique to 
compute the belief assignments and reduce the number of focal elements. Therefore, the proposed framework allows reducing 
drastically the computational complexity of the combination process for the multi-class classification. A case study conducted 
on the handwritten digit recognition shows that the proposed supervised model with PCR6 rule yields the best performance 
comparatively to SVM multi-classifications even when they provide uncalibrated outputs. In continuation to the present 
work, the next objectives consist to adapt the use of one-class classifiers instead of the OAA implementation of SVM in order 
to obtain a fixed number of focal elements within DSmT combination process. This will allow us to have a feasible 
computational complexity independently of the number of combined sources and the size of the discermnent space. 
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